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DECONVOLUTION WITH UNKNOWN ERROR DISTRIBUTION 

By Jan Johannes 

Ruprecht-Karls-Universitdt Heidelberg 

We consider the problem of estimating a density fx using a sam- 
ple Yi,. . . ,Yn from fy = fx * fe, where /e is an unknown density. 
We assume that an additional sample 6l , . . . , €m from /e is observed. 
Estimators of fx and its derivatives are constructed by using non- 
parametric estimators of fy and /e and by applying a spectral cut-off 
in the Fourier domain. We derive the rate of convergence of the esti- 
mators in case of a known and unknown error density /e , where it is 
assumed that fx satisfies a polynomial, logarithmic or general source 
condition. It is shown that the proposed estimators are asymptoti- 
cally optimal in a minimax sense in the models with known or un- 
known error density, if the density fx belongs to a Sobolev space Hp 
and /e is ordinary smooth or supersmooth. 

1. Introduction. Let X and e be independent random variables with 
unknown density functions fx and /e, respectively. The objective is to non- 
parametrically estimate the density function fx and its derivatives based on 
a sample oiY = X + e. In this setting, the density /y of Y is the convolution 
of the density of interest, fx, and the density of the additive noise, that 
is, 

/oo 
fx{x)f,{y-x)dx. 
-oo 

Suppose we observe Yi, . . . ,Yn from fy and the error density is known. 
Then, the estimation of the deconvolution density fx is a classical prob- 
lem in statistics. The most popular approach is to estimate /y by a kernel 
estimator and then solve (1.1) using a Fourier transform (see Carroll and 
Hall [4], Devroye [7], Efromovich [9], Fan [11, 12], Stefanski [36], Zhang [41], 
Goldenshluger [[14], [15]] and Kim and Koo [21]). Spline-based methods are 
considered, for example, in Mendelsohn and Rice [28] and Koo and Park [22]. 
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The estimation of the deconvolution density using a wavelet decomposition 
is studied in Pensky and Vidakovic [34], Fan and Koo [13] and Bigot and Van 
Bellegem [1], while Hall and Qiu [16] have proposed a discrete Fourier series 
expansion. A penalization and projection approach is proposed in Carrasco 
and Florens [3] and Comte, Rozenholc and Taupin [6]. 

The underlying idea behind all approaches is to replace in (1.1) the un- 
known density fy by its estimator and then solve (1.1). However, solving 
(1.1) leads to an ill-posed inverse problem and, hence, the inversion of (1.1) 
has to be "regularized" in some way. We now describe three examples of reg- 
ularization. The first example is kernel estimators, where the kernel has a 
limited bandwidth, that is, the Fourier transform of the kernel has a bounded 
support. In this case, asymptotic optimality, both pointwise and global, over 
a class of functions whose derivatives are Lipschitz continuous, is proven in 
Carroll and Hall [4] and Fan [11, 12]. The second example is estimators based 
on a wavelet decomposition, where the wavelets have limited bandwidths. 
For the wavelet estimator, Pensky and Vidakovic [34] show asymptotic opti- 
mality of the mean integrated squared error (MISE) over the Sobolev space 
Hp, which describes the level of smoothness of a function / in terms of its 
Fourier transform J^f. In the third example, the risk in the Sobolev norm 
of Hg {Hg-nsk) and asymptotic optimality over Hp, p> s, of an estimator 
using a spectral cut-off (thresholding of the Fourier transform J^f^ of /e) is 
derived in Mair and Ruymgaart [26]. 

However, in the above examples, fx and are assumed to be ordinary 
smooth or supersmooth, that is, their Fourier transforms have polynomial 
or exponential descent. All these cases can be characterized by a "source 
condition" (defined below), which allows for more general tail behavior. 

In several applications, for example, in optics and medicine (cf. Tessier [38] 
and Levitt [23]), the noise density fe may be unknown. In this case, without 
any additional information, the density fx cannot be recovered from the 
density of /y through (1.1), that is, the density fx is not identified if only 
a sample Yi,...,Yn from /y is observed. It is worth noting that in some 
special cases the deconvolution density fx can be identified (cf. Butucea 
and Matias [2] or Meister [27]). Deconvolution without prior knowledge of 
the error distribution is also possible in the case of panel data (cf. Horowitz 
and Markatou [19], Hall and Yao [17] or Neumann [32]). 

In this paper, we deal with the estimation of a deconvolution density 
fx when only an approximation of the error density is given. More pre- 
cisely, following Diggle and Hall [8] we suppose, that in addition to a sample 
Yi,. . . ,Yn from /y , we observe a sample ei, . . . , 6^ from An interesting 
example in bio-informatics can be found in the analysis of cDNA microar- 
rays, where Y is the intensity measure, X is the expressed gene intensity 
and e is the background intensity (for details see Havilio [18]). In a situation 
where an estimator of fe is used, rather than the true density, Neumann [31] 
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shows asymptotic optimality of the MISE over the Bessel-potential space 
when the error density is ordinary smooth. In case of a circular convolution 
problem, Cavalier and Hengartner [5] present oracle inequalities and adap- 
tive estimation. However, they also assume the error density to be ordinary 
smooth. By constraining the error density to be ordinary smooth, a rich 
class of distributions, such as the normal distribution, are excluded. The 
purpose of this paper is to propose and study a deconvolution scheme which 
has enough flexibility to allow a wide range of tail behaviors of T fx and 

The estimators of the deconvolution density considered in this paper are 
based on a regularized inversion of (1.1) using a spectral cut-off, where we re- 
place the unknown density fy by a nonparametric estimator and the Fourier 
transform of /e by its empirical counterpart. We derive the //g-risk of the 
proposed estimator for a wide class of density functions, which unifies and 
generalizes many of the previous results for known and unknown error den- 
sity. Roughly speaking, we show in case of known that the i^^-risk can 
be decomposed into a function of the MISE of the nonparametric estima- 
tor of /y plus an additional bias term which is a function of the threshold 
(the parameter which determines the spectral cut-off point). The relation- 
ship between Tjx and Tf^ is then essentially determining the functional 
form of the bias term. For example, the bias is a logarithm of the threshold 
when the error distribution is supersmooth (e.g., normal) and fx is ordi- 
nary smooth (e.g., double exponential). On the other hand, if both the error 
distribution and fx are ordinary smooth or supersmooth, the bias is a poly- 
nomial of the threshold. We show that the theory behind these rates can 
be unified using an index function k (cf. Nair, Pereverzev and Tautenhahn 
[29]), which "links" the tail behavior of T fx and Tff, by supposing that 
\Tfx?l^{\Tfs?) is integrable. 

Under certain conditions on the index function, we prove that the i^^- 
risk in the model with unknown can be decomposed into a part with the 
same bound as the ff^-risk for known and a second term which is only a 
function of the sample size m (of errors e). The functional form of the second 
term is then again determined by the relationship between Tfx and Tf^. 
We show that the second term provides a lower bound for the ff^-risk on its 
own and, hence, cannot be avoided. It follows that the estimator is minimax 
in the model with unknown when the bound of the //s-risk for known 
/e is of minimax optimal order. Furthermore, it is of interest to compare 
the rates of convergence of the iJ^-risk when the density of /e is estimated 
with the rates, where is known. We show that under certain conditions 
on the index function, a sample size m which increases at least as fast as 
the inverse of the MISE of the nonparametric estimator of /y, ensures an 
asymptotically negligible estimation error of /e. However, in special cases 
even slower rates of m are enough. 
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In this paper, we use the classical Rosenblatt-Parzen kernel estimator 
(cf. Parzen [33]) without a limited bandwidth to estimate the density /y. 
However, since the Hg-nsk of the proposed estimator can be decomposed 
using the MISE of the density estimator of /y, any other nonparametric 
estimation method (e.g., based on splines or wavelets) can be used and the 
theory still holds. 

The paper is organized in the following way. In Section 2, we give a brief 
description of the background of the methodology and we define the esti- 
mator of fx when the density is known as well as when is unknown. 
We investigate the asymptotic behavior of the estimator of fx in case of 
a known and an unknown density fe in Sections 3 and 4, respectively. All 
proofs can be found in the Appendix. 



2. Methodology. 



2.1. Background to methodology. In this paper, we suppose that fx and 
/g [hence also /y] are contained in the set V of all densities in L'^(M), which is 
endowed with the usual norm || • || . We use the notation [J-g] {t) for the Fourier 
transform exp{—itx)g{x) dx of a function g G L^(M) n L^(M), which 

is unitary. Since X and e are assumed to be independent, the Fourier trans- 
form of fy satisfies Tfy = ^p2M -T fx'^ fa- Therefore, assuming |[.?^/e](t)|^ > 
0, for all t G M, the density fx can be recovered from /y and /e by 



(2.1) Tfx-- •^^^■•^^^ 



2vr • \Tf, 



where Tfe denotes the complex conjugate of T ff,. It is well known that re- 
placing in (2.1) the unknown density /y by a consistent estimator fy does 
not in general lead to a consistent estimator of fx- To be more precise, 
since \Tf^~^ is not bounded, ]E||/y — /yjj^ = o(l) does not generally im- 
ply E||[^/y — Tfy\ ■ = o(l), that is, the inverse operation of a 
convolution is not continuous. Therefore, the deconvolution problem is ill 
posed in the sense of Hadamard. In the literature, several approaches are 
proposed in order to circumvent this instability issue. Essentially, all of them 
replace (2.1) with a regularized version that avoids having the denomina- 
tor becoming too small [e.g., nonparametric methods using a kernel with 
limited bandwidth estimate J-fy{t), and also J-fxif), for |t| larger than a 
threshold by zero]. There are a large number of alternative regularization 
schemes in the numerical analysis literature available, such as the Tikhonov 
regularization, Landweber iteration or the z/-methods, to name but a few 
(cf. Engl, Hanke and Neubauer [10]). However, in this paper we regularize 
(2.1) by introducing a threshold a > and a function £s(t) := (1 + t^y/"^ ^ 
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s,t£M, that is, for s > 0, we consider the regularized version given by 



(2.2) Tf^, := 1^^^^ • l{\:FfJis? > a}. 

Then, f^g belongs to the well-known Sobolev space Hs defined by 

(2.3) := |/ G L^R) : \\ff, := | Jl + t^)^] [^/](t)|2 < ooj. 

Moreover, let := {/ G -ffsHI/lls — P}) ^or p > 0. Thresholding in the 
Fourier domain has been used, for example, in Devroye [7], Liu and Tay- 
lor [24], Mair and Ruymgaart [26] or Neumann [31] and coincides with an 
approach called spectral cut-off in the numerical analysis literature (cf. Taut- 
enhahn [37]). 

2.2. Estimation of fx when f^ is known. Let Yij^. . ,y„ be an i.i.d. sam- 
ple of Y , which we use to construct an estimator fy of /y. The estimator 
fxs of fx based on the regularized version (2.2) is then defined by 



(2.4) :Ffxs := ^J^Zi\, ■ > «}, 

^/2^T ■ \J=^fer 

where the threshold a := a(n) has to tend to zero as the sample size n 
increases. The truncation in the Fourier domain will lead as usual to a bias 
term which is a function of the threshold. In Lemma A.l in the Appendix, we 
show that by using this specific structure for the truncation, the functional 
form of the bias term is determined by the relationship between J^fx and 
J-fe- In this paper, we stick to a nonparametric kernel estimation approach, 
but we would like to stress that any other density estimation procedure 
could be used as well. The kernel estimator of /y is defined by 

(2-5) My)■■=^^2K(^ipi), yeR, 



nh "f— ; V h 



where ^ > is a bandwidth and K a kernel function. As usual in the context 
of nonparametric kernel estimation the bandwidth h has to tend to zero as 
the sample size n increases. In order to derive a rate of convergence of /y, we 
follow Parzen [33] and consider, for each r > 0, the class of kernel functions 

(2.6) ICr := |k G L\R) n L2(IR) : lim ' ^ ~ "^jf ' = < ooj. 

If /y G H^, for q,r > 0, then the MISE of the estimator /y given in (2.5), 
constructed by using a kernel K & ICr and a bandwidth h = cn~^^^'^^~^^\ c > 
0, is of order n~^^/(^^+^) (cf. Parzen [33]) and, hence, obtains the minimax 
optimal order over the class (cf. [40], Chapter 24). 
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2.3. Estimation of fx given an estimator of f^. Suppose Yi,...,!^ and 
ei, . . . , em form i.i.d. samples of fy and fe, respectively. We consider again the 
nonparametric kernel estimator fy defined in (2.5). In addition, we estimate 
the Fourier transform Tf^ using its empirical counterpart, that is. 



(2.7) ra(t):= ^Ee-*^^ t G M. 

Then, the estimator fxs based on the regularized version (2.2) is defined by 



(2.8) tTxs ■■= ^J^\^], ■ Hn/^s\' > a}, 

v27r • \J-Jer 

where a := a{n, m) has to tend to zero as the sample sizes n and m increase. 



3. Theoretical properties of the estimator^ when /e is known. We shall 
measure the performance of the estimator fxs defined in (2.4) by the Hg- 
risk, that is, E||/xs — /x ||s, provided fx £ Hp, for some p> s>Q. For an 
integer k, the Sobolev norm H^Hfc is equivalent to H^H + H^C") ||, where the kth 
weak derivative ^c^' of g satisfies [J- g(*''^]{t) := {—it)^lJ-g]{t). Therefore, the 
iJfc-risk refiects the performance of fxk ^^'^ fxk^ as estimators of fx and 
/j^^ , respectively. However, in what follows a situation without an a priori 
assumption on the smoothness of fx is also covered considering p = s = 0. 

The ffg-risk is essentially determined by the MISE of the estimator of fy 
and by the regularization bias. To be more precise, by using fxs §i^6n in 
(2.2) and assuming fx G Hp, for some p > s > 0, we bound the iJ^-risk by 

(3.1) nTxs - fxWl < vr-i«-iE||iV - /y f + 2||/1, - fx\\l 

where, due to Lebesgue's dominated convergence theorem, the regularization 
bias satisfies \\fxs ~ fxW^s — '^(1) ^-s tends to zero. 

Proposition 3.1. Suppose that fx G Hp, p > 0. Let fy he a consistent 
estimator of fy, that is, IE||/y — /y|p = o(l) as n ^ oo. Consider, for < 
s <p, the estimator fxg given in (2.4 ) with threshold satisfying a = o(l) and 
IE||/y — /y iP/a = o(l) asn^oo. Then, KWfxg — fxWl = o{l) asn^oo. 

In order to obtain aerate of convergence of the regularization bias and, 
hence, the ff^-risk of fxs^ we consider first a polynomial source condition 

(3.2) /):=||4--F/x-(|.F/e/4|')"^/'|| <oo for some/? >0,s>0. 
Note that (3.2) implies that fx G Hg. 
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Example 3.1. To illustrate this and also the following source condi- 
tions, let us consider three different types of densities. These are, (i) the 
density g oia symmetrized distribution with k degrees of freedom, that is, 
[J^g]{t) = (27r)-^/2(l +4t2)-'=/2^ (ii) the density g of a centered Cauchy dis- 
tribution with scale parameter 7 > 0, that is, [J^g]{t) = (27r)^-'^/^ exp(— 7|t|), 
and (iii) the density g of a centered normal distribution with variance 
0-2 > 0, that is, [Tg]it) = (27r)-^/2 expi-aH"^ /2). Suppose fx and are sym- 
metrized densities with kx and k^ degrees of freedom, respectively. Then, 
the polynomial source condition (3.2) is only satisfied for < s < kx — 1/2. 
If fx and fe are Cauchy densities or fx and fe are Gaussian densities, then 
J- fx and J-fe descend exponentially and (3.2) holds for all s > 0. 

Theorem 3.2. Suppose that fx satisfies the polynomial source condi- 
tion (3.2), for some s > and f3 > 0. Consider the estimator fxs defined 
in (2.4) by using a threshold a = c- (E||/y - /y f )^/(^+^^ O 0. Then, there 
exists a constant C > depending only on p given in (3.2), (3 and c such 

that EWfxs -fxrs<C- (ElliV - fYWy/^^^'l as E\\j^ - /y f ^ 0. 

Remark 3.1. In Lemma A.l in the Appendix, we show by applying 
standard techniques for regularization methods that the polynomial source 
condition (3.2) implies \\fxs ~ fxWi < . Then, we obtain the result 
by balancing in (3.1) the two terms on the right-hand side. On the other 
hand, from Theorem 4.11 in Engl, Hanke and Neubauer [10] follows that 
Wfxs ~ fx\\1 = 0(o'')i for some > 0, implies (3.2) for all (3 < rj, that is, 
the order O(a^) is optimal over the class {fx satisfies (3.2)}. Therefore, 
one would expect that an optimal estimation of fy leads to an optimal 
estimation of fx- However, the polynomial source condition is not sufficient 
to derive an optimal rate of convergence of the MISE of /y over the class 
{/y = fe'^ fx '■ fx satisfies (3.2)}. For example, if is a Gaussian density, 
this class contains only analytic functions, while it equals when 
/e is a Laplace density. 

Without further information about it is difficult to give for arbitrary 
/9 > an interpretation of the polynomial source condition. However, if we 
suppose additionally that /g is ordinary smooth, that is, there exists a > 1/2 
and a constant d > 0, such that 

(3.3) d<{l + t^f\[Ff^]{t)\^<d-^ foralHGM. 

Then, the smoothness condition fx G Hp, for some p > 0, is equivalent to the 
polynomial source condition (3.2) with < s < p and (3 = (p — s) /{s -\- a). 
Moreover, we have f/p+a = {fy = fe * fx '■ fx ^ Hp}, for all p>0. There- 
fore, the convolution with is also called finitely smoothing (cf. Mair and 
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Ruymgaart [26]). From Theorem 3.2, we obtain the following corollary, which 
establishes the optimal rate of convergence of fxs over Hp. 

Corollary 3.3. Suppose that fx G Hp, p > and satisfies (3.3) 
for a> 1/2. Let fy defined in (2.5) he constructed using a kernel K S /Cp+a 
[see (2.6)] and a bandwidth h = cn~-^/(^(^'+")+^) , c > 0. Consider for < s < p 
the estimator fxg defined in (2.4) with threshold a = cn~^("+*)/(^(^'+")+'^), 
c> 0. Then, we have E\\fx, - /x||^ = 0(n-2(p-^)/(2(«+p)+i)) asn^oo. 

Remark 3.2. The rate of convergence in the last result is known to be 
minimax optimal over the class HP, provided that the density satisfies 

(3.3) (cf. Mair and Ruymgaart [26]). Since under the assumptions of the 
corollary fx belongs to Hp if and only if fy lies in Hp^ay it follows that 
the kernel estimator of fy is constructed such that its MISE has the min- 
imax optimal order over the class Hp^^. Moreover, using an estimator of 
fy which does not have an order optimal MISE, the estimator of fx would 
not reach the minimax optimal rate of convergence. Hence, in this situation 
the optimal estimation of fy is necessary to obtain an optimal estimator of 
fx- We shall emphasize the role of the parameter a, which specifies through 
the condition (3.3) the tail behavior of the Fourier transform J-'fe. As we 
see, if the value a increases, the obtainable optimal rate of convergence de- 
creases. Therefore, the parameter a is often called degree of ill posedness (cf. 
Natterer [30]). 

If, for example, fx is a Laplace and is a Cauchy or Gaussian density, 
then not a polynomial but a logarithmic source condition holds true, that 
is, 

(3.4) p:=\\£s-J'fx-\H\^fe/^s\^)f^^\\<oo for some /?> 0, s > 0. 



Theorem 3.4. Let fx satisfy the logarithmic source condition (3.4), for 
some s>0 and /? > 0. Consider the estimator fxs defined in (2.4) by using 
a threshold a = c ■ (IE||/y — /y||^)^''^, for some c > 0. Then, there exists a 
constant C > depending only on p given in (3.4), P and c such that we 
have E\\fx, -fxfs<C-\ log(E||/? - frfT'', as E\\h - /y||' - 0. 

Additionally, if we assume that the density is supersmooth, that is, 
there exists a > and a constant d> 0, such that 

(3.5) d< (l + t2)«|ln(|[.F/e](t)|2)|'^ <d-^ for alH G M, 
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then the smoothness condition fx G Hp, p > is equivalent to the logarith- 
mic source condition (3.4), with < s <p and (3 = {p — s) / a. Moreover, Z^, 
and therefore /y, belong to Hj., for all r > 0, and given a > 1, fe and hence 
/y, are analytic functions (cf. Kawata [20]). Therefore, the convolution with 
/e is called infinitely smoothing (cf. Mair and Ruymgaart [26]). 

Corollary 3.5. Suppose that fx G Hp, p> and f^ satisfies (3.5) for 
some a > 0. Let fy given in (2.5) he constructed by using a kernel K £ ICr 
[see (2.6)] and a bandwidth h = cn"^/*^^*""^^), c, r > 0. Consider, forO < s <p, 
the estimator fxg defined in (2.4) with threshold a = cn~^/^'^^'^^\ c > 0. 
Then, we have K\\fxs - fxWl = 0{(logn)-^P-'^/'') , asn^oo. 

Remark 3.3. The rate of convergence in Corollary 3.5 is again mini- 
max optimal over the class H^, given that the density /e satisfies (3.5) (cf. 
Mair and Ruymgaart [26]). It seems rather surprising that in opposite to 
Corollary 3.3, an increasing value r improves the order of the MISE of the 
estimator fy uniform over the class {fy = fe* fx '■ fx & H^}, but does not 
change the order of the i/^-risk of fxs (compare Remark 3.2). This, how- 
ever, is due to the fact that the iJ^-risk of fxs is of order 0{n~^^^'^^'^^^) + 
0((logn^/(2^+^))-(P-^)/'^) = 0((logn)-(P^'^)/"). So r does not appear for- 
mally, but is actually hidden in the order symbol. Note that neither the 
bandwidth h nor the threshold a depends on the level p of smoothness of 
fx, that is, the estimator is adaptive. Moreover, the parameter a specifying 
in condition (3.5) the tail behavior of the Fourier transform J-fe, in this 
situation also describes the degree of ill posedness. 

Consider, for example, a Cauchy density fx and a Gaussian density 
fe, then neither the polynomial source condition (3.2) nor the logarithmic 
source condition (3.4) is appropriate. However, both source conditions can 
be unified and extended using an index function k: (0, 1] — > M"*", which we 
always assume here to be a continuous and strictly increasing function with 
k(0+) = (cf. Nair, Pereverzev and Tautenhahn [29]). Then, we consider a 
general source condition 

(3.6) /3:=||4--^/x-|k(|-^^/./4|^)|"^/^|| <oo for some s>0. 



Theorem 3.6. Let fx satisfy the general source condition (3.6) for 
some concave index function k and s > 0. Denote by ^ and uj the inverse 
function of k, and u!~^{t) ■.= t^(t), respectively. Consider the estimator fxs 
defined in (2.4) by using a = c - E\\fy - /y||^/u;(c • E||/y - /y||^), c> 0. 
Then, there exists a constant C > depending only on p given in (3.6) and 
c such that EWfxs -fx\\l<C- u{E\\j^ - fy\\^), as E\\h - /y||' ^ 0. 
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Remark 3.4. (i) Let Sj^ be the set of all densities fx satisfying the 
general source condition (3.4) with /c < 7. We define the modulus of con- 
tinuity 5jJ := sup{||5(||s :g S Sj^, * < 5} of the inverse operation 
of a convolution with over the set C Hg. Since the index function 
K is assumed to be concave, it follows that the inverse function of uj is 
convex. Then, by using Theorem 2.2 in Nair, Pereverzev and Tautenhahn 
[29], we have uj{5) = 0{uj{6,Sj^)), as 5 — > 0. In the case of a deterministic 

approximation fy of fy with ||/y- — /y|| < (5, it is shown in Vainikko and 
Veretennikov [39] that u>{6,SjJ provides a lower bound over the class Sj^ 

of the approximation error for any deconvolution method based only on fy- 
Therefore, we conjecture, that the bound in Theorem 3.6 is order optimal 
over the class Sj^, given the MISE of fy is order optimal over the class 

{fY = fx*fe,fx^Sl}. 

(ii) Define K{t) := \ log(ct)|~^, c := exp(— 1 — /3). Then, k is a concave in- 
dex function and ll!{6) = \ log(5|~^(l + o(l)), as 5 ^ (see Mair [25]). Thus, 
the result under a logarithmic source condition (Theorem 3.4) is covered 
by Theorem 3.6. However, the index function K{t) = is concave only if 
/? < 1, and hence the result in the case of a polynomial source condition 
(Theorem 3.2) is only partially obtained by Theorem 3.6. Nevertheless, 
we can apply Theorem 3.6 in the situation of a Cauchy density fx and 
a Gaussian density fe (compare Example 3.1), since in this case, for all 
< /3 < 27/0" and s > 0, the general source condition is satisfied with con- 
cave index function K,{t) = exp(— /3-\/] log(ct)]), c := exp(— (/3^ V 2)). More- 
over, if we denote h{t) := (t//3 + fS/lf, then w"^(t) = exp(-/i(- logt))/c', 
with c' = exp(/?V4 + {0^ V 2)). Since uj{t) = exp(-/i-i(- logt/c')), with 
h~^{y) = (5y/y — 0^/2 for all y > /3^/4, we conclude that the iJ^-risk in this 

case is of order exp(— /?] logE]]/y — /y]]^]^^^). 

4. Theoretical properties of the estimator when /e is unknown. Let 

be defined by tJ^^ := li]^!,/^^ > a} ■ Tfx- Then, assuming fx G Hp, 
p> s, we bound the Hg-iisk of fxg given in (2.8) by 

(4.1) E]]^, - fxf, < 2E\\fxs - f^£ + ml^s - fxWl 

where we show in the proof of the next proposition that E]]/xs — /xslls 
is bounded up to a constant by a^^(E]]/y — /y]]^ -|- m~^), and that the 
"regularization error" satisfies E]]/^^ — fxWl = o(l) as a ^ and m — > 00. 

Proposition 4.1. Suppose that fx G Hp, p>0. Let fy be a consistent 
estimator of fy , that is,K\\fy — fy\\'^ = o{l) asn—> 00. Consider, for < s < 
p, the estimator fxs given in (2.8) with threshold (l/mVE]]/y — /y]]^)/a = 
0(1) anda = o{l) as n,m^ 00. Then, ¥,\\ fxs — fxWl = o{l) as n,m^ 00. 
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Remark 4.1. If we assume, in addition to the conditions of Proposi- 
tion 4.1, that ni~^ = 0(E||/y — /y |p) as n — > oo, then we recover the result 
of Proposition 3.1 when is a priori known. In fact, in ah the results below 
the condition = 0(E||/y — /y |p) on the sample size m as n — > oo, en- 
sures that the error due to the estimation of is asymptotically negligible. 
However, in some special cases an even slower rate of m is possible (see, e.g.. 
Theorems 4.2 or 4.6). 

Theorem 4.2. Let fx satisfy the polynomial source condition (3.2) for 
some s > and /? > 0. Consider the estimator fxg defined in (2.8) with 
a = c-{(E||]y-/y||2)i/(/3+i) + yTj-i}, c>0. T/ien, /orE||]y-/y||2^0 and 
m^oo,we have E||/^, -fxfs<C- {(E||]V - /y f + m-^^^^)}, for 

some C > depending only on p given in (3.2), f3 and c. 

Remark 4.2. To illustrate the last result, suppose the sample size m 
satisfies = 0((E||/y — /y ||2)(/^vi)/{/3+i)-j g^j^^ hence m grows 

with a slower rate than = 0(E||/y — /y |p) (see Remark 4.1). Then, the 

i/g-risk of fxg is bounded up to a constant by (E||/y — fY\\'^)^^^^~^^\ as in 
the case of an a priori known (see Theorem 3.2). 

The next assertion shows that the second term given in the bound of 
Theorem 4.2 cannot be avoided when the samples from /y and are inde- 
pendent. For / G L^(M), let us define the class of densities 

(4.2) V] := {g G P: tI-^/I' < l-^sl' < 7^'l-^/l'}, 7 > 0. 

Proposition 4.3. Suppose the samples from fy and f^ are independent. 
Let f£V, and define S^^ := {g £V -.Wi, ■ J^g ■ {\J^f <p}, p>0. 

Then, we have inf<~- sup^ g-p7 ^ g^p E||/x — fx\\l ^ C ■ m~'^f^^^\ for some 
C > 0, depending only on f , p and 7. 

If /e is ordinary smooth, that is, (3.3) holds for some a > 1/2, then fx G 
Hp, p > is equivalent to the polynomial source condition (3.2) with < 
s <p and (3 = {p — s)/{s + a). Thus, Theorem 4.2 implies the next assertion. 

Corollary 4.4. Suppose fe satisfies (3.3) for a> 1/2 and fx G Hp, 
p> 0. Let fy given in (2.5) he constructed by using a kernel K G /Cp+a oind a 
bandwidth h = cn~^^^'^^P^"'^~^^\ c > 0. Consider, for < s <p, the estimator 
Jxs defined in (2.8) with a = c{n-2('*+")/(2(p+«)+i) +m-^}, c> 0. Then, 
MTxs - fxfs = 0(n-2{P-*)/(2(p+«)+i) +m-(i^(P-^)/("+^))) as n,m^ 00. 
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In case of an a priori known and ordinary smooth f^, the optimal order of 
the Hg-icisk over H^^ is (see Remark 3.2), which together 

with Proposition 4.3 implies the next corollary. 

Corollary 4.5. Suppose the samples from fy and f^ are indepen- 
dent. Denote by T>a the set of all densities satisfying (3.3) with a > 1/2. 



Remark 4.3. If the samples from /y and are independent, then due 
to Corollaries 4.4 and 4.5 the order of the smallest m for archiving the 
same convergence rate as in the case of an a priori known (Corollary 
3.3) is given by m'^ = 0(n-2[{p-s)v(a+s)]/[2(p+a)+i] ^e shah emphasize the 

interesting ambiguous influences of the parameters p and a characterizing 
the smoothness of fx and /e, respectively. If in case of (p — s) < (a + s) the 
value of a decreases or the value of p increases, then the estimation of is 
still negligible given a relative to n slower necessary rate of m. While in the 
case of {p — s) > {a + s) a decreasing value of a or an increasing value of p 
leads to a relative to n faster necessary rate of m. However, in both cases 
a decreasing value of a or an increasing value of p implies a faster optimal 
rate of convergence of the estimator fx^- 

Theorem 4.6. Let fx satisfy the logarithmic source condition (3.4), 
for some s > and f3 > 0. Consider the estimator fxs defined in (2.8) by 
using a threshold a = c{{W.\\h - fyfy/'^ + m-^/^], c> 0. T/ien, /or E||/y - 
fyf ^ and m ^ oo, we have E||/^, - fx\\l < C{\ log(E||/P - /y f )|-/5 + 
(logm)"^}, for some C > depending only on p given in (3.4), (3 and c. 

Remark 4A^. Assume that, for some z/ > 0, the sample size m satisfies 
m^^ = 0((E||/y — /y|P)'^) as n — > oo, and hence m may grow with a fare 
slower rate than implied by the condition = 0(E||/y — /y |p) (compare 
Remark 4.1). Then, as in the case of an a priori known (see Theorem 3.4), 
the Hs-risk of fxs is bounded by C| log(E||/P - /yf)!"^, for some C > 0. 
Note that the influence of the parameter v is hidden in the constant C. 

The next assertion states that the second term given in the bound of 
Theorem 4.6 cannot be avoided. 

Proposition 4.7. Suppose the samples from fy and f^ are independent. 



Let feV, and define := {g G V:\% ■ Tg ■ | log(|.F//4P)|^/2|| < p}, 



Then, inf^sup^^g^P^^^g2?,lE||/x 

^-(lA(p-s)/(a+s))|_ 




some C > 0, depending only on f , p and 7. 
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Assume that /e is supersmooth, that is, (3.5) holds for a > 0. Then, 
fx G Hp, p > 0, is equivalent to the logarithmic source condition (3.4) with 
< s <p and (3 = {p — s) /a. Thus, Theorem 4.6 implies the next assertion. 

Corollary 4.8. Suppose satisfies (3.5), for o > and fx G Hp, p > 
0. Let fy defined in (2.5) he constructed by using a kernel K ^ ICr [see 
(2.6)] and a bandwidth h = cn"^/^^*"^^), c, r > 0. Consider, for < s <p, the 
estimator fxs defined in (2.8) with a = c{n^''' ^ '^'^'^'^^'^ +m~-^/^}, c > 0. Then, 
Hfx^ - fxWl = 0((log?i)-(P-^)/" + (logm)-(P-^)/'^) asn,m^ oo. 

In case of an a priori known and supersmooth /e, the optimal order of 
the i^s-risk over H^ is (logn)"^^"*^/'^ (see Remark 3.3), which together with 
Proposition 4.7 leads to the next assertion. 

Corollary 4.9. Suppose the samples from fy and are independent. 
Denote by Da ihe set of all densities satisfying (3.5) with a > 0. Then, 
inf^sup^^g^P,^^g^^E||/^ - fxWl > C{(logn)-(P-)/- + (log m) 

Remark 4.5. If we assume = 0{n~^), for some > 0, then the 
order in the last result simplifies to (logn)"'-^"''-'/" and hence, equals the 
optimal order for known (see Corollary 3.5). Therefore, if the samples 
from fy and are independent, then from Corollary 4.8 and 4.9 it follows 
that the error due to the estimation of is asymptotically negligible if and 
only if the sample size m grows as some power of n. In contrast to the 
situation in Corollary 4.4 and 4.5, if is supersmooth, that is, (3.5) holds 
for a > 0, and fx S Hp, p> 0, then the influence of the parameters p and a is 
not ambiguous. A decreasing value of a or an increasing value of p implies a 
faster optimal rate of convergence of the estimator fxs^ ^^'^ the relative to n 
necessary rate of m is not affected. Note that the estimator is adaptive as in 
a case of known supersmooth error density (see Remark 3.3). We shall stress 
that the estimation of has no influence on the order of the i^^-risk of fx^^ 
as long as the sample size m grows as fast as some power of n. However, the 
influence is clearly hidden in the constant of the order symbol. 

Theorem 4.10. Let fx satisfy the general source condition (3.6) for 
some concave index function k and s >0. Denote by ^ and uj the inverse 
function of k and iO~^{t) :=t^{t), respectively. Consider fxs defined in (2.8) 
with a = c{E||/P - /y||Vw(E||/P - fyf) + l/m], c> 0. Then, we have 

nTxs - fxWl < C{uj{n7Y - fvf) + «(l/m)}, as nJv - fvf ^ and 
m — > oo, for some C > 0, depending only on p given in (3.6) and c. 
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Remark 4.6. Assume that m ^ = 0(E||/y - /y |p) as n ^ oo, then the 
Hg-risk of fxg is bounded up to a constant by u;(E||/y — /yp) as m case of 
an a priori known (see Theorem 3.6). Thus, the general source condition 

supposing = 0(E||/y — /y |p) is sufficient to ensure that the estimation 
of the noise density is asymptoticahy neghgible. 

Proposition 4.11. Let the samples from fy and be independent and 
fGV. Define := {g G P: ||4 • J'g ■ K{\J^f /isl^T^^^W <p}, P> 0. Then, 

we have inf<~- supf g-p^jjf e5'' ^11/^ ~ fxWi > C ■ K{l/m), for some C > 
depending only on f , p and 7. 

Remark 4.7. Due to Proposition 4.11 in the case of independent sam- 
ples from /y and /e, the term K{l/m) given in the bound of Theorem 4.10 
cannot be avoided. It follows that our estimator fxs attains the minimax 
optimal order over when ti;(E||/y — /y |p) is the optimal order for known 
fe (compare Remark 3.4). 

APPENDIX 

Proof of Proposition 3.1. The proof is based on the decomposition 
(3.1), where a"^ > sup(giR+ t^^l{t > a} is used to obtain the first term on 
the right-hand side. If fx S Hp, P > > 0, then by making use of the relation 

Wfxs - fxWl = \\H\^fe/is\^ <a}-is- J'fxf < Wis ■ Tfxf < WfxWl < OO, 

the second term satisfies \\fxs ~ /^lls = "(1), as a ^ 0, due to Lebesgue's 
dominated convergence theorem. Therefore, the conditions on a ensure the 
convergence to zero of the two terms on the right-hand side in (3.1) as n 
increases, which gives the result. □ 

Assuming is known, the next lemma summarizes the essential bounds of 
the regularization bias depending on the polynomial, logarithmic or general 
source condition. 

Lemma A.l. Let w.M ^ [l,oo) be an arbitrary weight function. Suppose 
there exists /3 > such that: 

(i) p := \\w ■ Tfx ■ d-^/eP/""^^)"^^^!! < 00 is satisfied, then 
(A.l) \\w ■ Tfx ■ < «}f < «^ • P'; 

(ii) p := \\w ■ Tfx ■ \ log(|J^/eP/tt^^)|'^^^|| < 00 zs satisfied, then 
(A.2) \\w ■ Tfx ■ l{\Tf,\yw^ < a}f <Cp-{- loga)"^ • p^; 
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(iii) p:= \\w-J^fx ■ II < oo is satisfied and assume that 

the index function k is concave, then 

(A.3) ||u; • J^fx ■ HlJ'feWw^ < «}f < • <a) ' 

where Cp, o'^e positive constants depending only on [3 and k, respectively. 

Proof. Denote Tpa '■= ^fx'^{\^fe/w\'^ < a}. Under the assumption (i) 
we have \\w ■ V'«|P < sup^gig+ t^l{t < a} • p^, which imphes (A.l). 

The proof of (A. 2) is partiahy motivated by techniques used in Nair, 
Pereverzev and Tautenhahn [29]. Let := | log(t)|~^, t G (0, 1) and (pjsit) := 

'^l^^i\[^fe]it)/w{t)\'^), t e M, then for ah t G M we have (ppiO) > (jjpit) > 0. 
Under assumption (ii), which may be rewritten as p= \\w • J^fx/^PpW < oo, 
we obtain 



(A.4) ||«;-Vaf = / w{t)Mt)Mi f^^YL'^^^*^ dt < \\w • 
due to the Cauchy-Schwarz inequahty. From (A.4) we conclude 
(A.5) \\Tf, • Vaf = ll-F/, • HlJ'feH^ < a} • %baf < a • \\w ■ c/)f,\\ • p, 

since a > sup^gig+ t ■ l{t < a}. Let be the inverse function of k^, then 
<I>^(s) = e~'^ ^^'^ , s > 0, which is convex on the interval (0,c|] with c| = 
(1 + P)~^ ■ Define 7| = c^/(j)'^{0) A 1. Therefore, Jensen's inequality implies 

f l}-\\w-^a-M\^ \ SR^p{ll-4>l{t))-w\t)-i;l{t)dt 
f\ ||«;-VaP )- J^W^t)-i^l{t)dt 

which together with $^3(72 • </,2(t)) < $^(<^2(^)) ^ |[_^j^](^)|2/^2(^) g-^^g 

7| • \\w ■ v„ • M\\ ^ kmm' ■ m) dt _ \\Th ■ V'.f 



(A.6) : ' „, " < 



W ■ ^l^a\\'^ ) ll'U^'V'oP ||«^ • "0. 



a I 



In order to combine the three estimates (A.4), (A.5) and (A.6), let us intro- 
duce a new function by ^ p(t) := ^pit^^jt^ . Since is convex, we con- 
clude that is monotonically increasing on the interval (0,0^3]. Hence, by 
(A.4), which may be rewritten as ||tt;- V'a -(^/jP^^Z/f"^^^ < ||w- V'q " <^/3 11/11 ^"V'ckII 
(< '^/^(O)), the monotonicity of ^ p and (A.6), 

Multiplying by 7^ • \w • if^a • 4>i3\\/ P and exploiting (A.5) yields 

(A.7) pMI'"-fe-^^IIU TO-fell^ <„. 

V /o / /9- ||i(;-'i/'a 
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Since ^f^^is) = |ln(s)| we obtain (A. 2) by combining (A. 4) and (A. 7). 

The proof of (A. 3) follows line by line the proof of (A. 2) using the concave 
index function k and its convex inverse function rather than and 

□ 

Proof of Theorem 3.2. The proof is based on the decomposition 
(3.1). The polynomial source condition (3.2) equals assumption (i) in Lemma 
A.l with w = is, therefore from (A.l) we obtain \\fxs ~ fxWi l£ • p"^- 
Balancing the two terms on the right-hand side in (3.1) then gives the result. 
□ 

Proof of Corollary 3.3. Under the conditions of the corollary, we 
have fy G Hp+a and, hence E\\f^- /y f = 0(n-2(p+«')/(2(p+«)+i))^ Moreover, 
the polynomial source condition (3.2) is satisfied with /3 = (p — s)/(a + s). 
Therefore, the result follows from Theorem 3.2. □ 

Proof of Theorem 3.4. The proof is similar to the proof of Theorem 
3.2, but uses (A. 2) in Lemma A.l with w = Ig rather than (A.l). The con- 
ditions ofjhe theorem then provide E||/^^ - fx\\l < C(E\\f^ - /yf )^/2 + 
C\ log(E||/y — /y |P)|~^, for some constant C > 0, depending only on p given 
in (3.4), P and c, which implies the result. □ 

Proof of Corollary 3^5. Under the conditions of the corollary, we 
have fy G Hr and, hence E||/y — /y |p = 0{n~'^^'^^'^'''~^^^). Moreover, the loga- 
rithmic source condition (3.4) is satisfied with jS = {p — s) / a. Therefore, the 
result follows from Theorem 3.4. □ 

Proof of Theorem 3.6. The proof is similar to the proof of Theorem 
3.2, but uses (A. 3) in Lemma A.l with w = Ig rather than (A.l). The condi- 
tion on a which may be rewritten as c • E||/y — /y |p = a • K{a) then ensures 
the balance of the two terms in (3.1). The result follows by making use of 
the relation uj{c • 5) < (c V 1) • uj{5) (Mair and Ruymgaart [26], Remark 3.7). 
□ 

Lemma A. 2. Suppose w.M.^ [1,00) is an arbitrary weight function, k 
is a concave index function andj^f^ is the estimator defined in (2.7). Then, 
for a// 7 > and i G R, we have 

E\[ff,]it)/w{t)-[:FfMt)/w{t)\'^ 

(A.8) 

<C(7)-m-^, 



E 



l{\[Tf,]{t)/w{t)\'>a} 



mm? 
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(A. 



< 



E 



+ 



(A.IO) 



l{\[m{t)h{t)\''>a} 



< 



[Tm)-[^h]{t)? 

K(l/m) 



a • m 



+ K{l/m) 



where C and C{'y) depending only on 7 are positive constants. 

Proof. Let 7 > and t G M. Define Zj := {{27r)~^/'^e-'^^^ - [Tfe]{t)}/ 
w{t), j = 1, . . . ,m, then Zi, . . . , Zm. are i.i.d. random variables with mean 
zero, and \ Zj\'^'^ < K for some positive constant K. Therefore, applying The- 
orem 2.10 in Petrov [35], we obtain (A. 8) for 7 > 1, while for 7 G (0, 1) the 
estimate follows from Lyapunov's inequality. 

Proof of (A. 9). Consider, for 7 > and i G R, the elementary inequality 



f mem/w{t)-[TfMMt)\''' ^ memMt)]^'' 



fA 1 1 ) 1 < 2^'^ 

^ ■ > - I \[Tm)/w{t)\^^ ' |ra(^)/^.(^)|27 

which together with \ [J^fe\{t)/w{t)\ < 1, for all t G M, implies 



E 



l{mMMt)\'>a} 

227 



\[j^mt)-[Tm)\' 

mm' 
E\[ff,]{t)/wit)-[j^mt)/witr('+y^ 



a 



+ 



E\[j^m)/w{t)-[j^m)/w{t)\'^ 



a 



1-7AI 



and by using (A. 8) we obtain the estimate (A. 9). 

Proof of (A.9). If < 1/m, then we obtain (A.IO) by us- 

ing (A.8) with 7 = 1 together with K{\[J='f^]{t)/w{t)\'^) < K{l/m). Since k 
is concave, we conclude that g{t) = K,{t?')/t^ is monotonically decreasing. 
Hence, if \[J- ff\{t) /w{t)\'^ > 1/m, then due to the monotonicity of g we 
have K{\[Tfe]{t)/w{t)\'^)\[J='fe]{t)/w{t)\-^ <mK{m~^), which together with 
inequality (A. 11), for 7 = 1, yields 



E 



i{m,m/Ht)\'>a} 



< 



K{\[J^Mit)/w{tr) 



limit) - nm? 
mm' 

Ei[ff,]{t)/w{t)-[:Fmt)/Htr 

a 
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+n[m{t)/n^{t)-[j^m)/w{t)\^ 

and by using (A. 8) we obtain the estimate (A. 10). □ 



Proof of Proposition 4.1. The proof is based on the decomposition 
(4.1). Due to (A. 9) in Lemma A. 2, we show below the bound 



(A.12) 



IE||/^.-/|j|?<vr-'«^'-E||iV-/yf 
+ 2C(0)•||/x||^a-l 



m 



while from Lebesgue's dominated convergence theorem and (A. 8) in Lemma 
A. 2, we conclude 



(A.13) 



x\ 



oil) 



as a ^ and m — > oo. 



Therefore, the conditions on a ensure the convergence to zero of the two 
terms on the right-hand side in (4.1) as n and m tend to oo, which gives the 
result. 

Proof of (A.12). Using q^^ > supjgiR+ t~^l{t > a}, we have 



E||/ 



fa ||2 
JXsWs 



(A.14) 



<Tr~^a''-E\\J^f^-J^fY\ 



+ 2 



E 



> a} 



\:Ffe/is - rfjis 



1/2 



X 4 ■ Tfx 



and hence \is ■ ^fx\\ = ^ < C)0, together with (A. 9) in Lemma 

A. 2 with w = is and 7 = 0, implies (A.12). 

Proof of (A.13). If fx £ Hp, p> s>0, then by making use of the relation 

nl^s-fxfs = \\m\^e/is\^ < a} -4 -^/xf < 114 -^/xf < WfxWl < oo 
the result follows due to Lebesgue's dominated convergence theorem from 
El{|[.F/e](t)/4(i)P < a} — > as a — > and m oo, that can be real- 
ized as follows. For all a < ao, we have \[J='fe]{t)\ > 2a^/^£s{t) and, hence 

El{|[J7,](t)/4(t)P <«} <P(ira(t)-ra(t)| > ira(t)|/2). Therefore, 
from Chebyshev's inequality and (A. 8) in Lemma A.2 with w = l and 7 = 1, 
we obtain (A.13). □ 

The next lemma summarizes the essential bounds of the "regularization 
error" depending on the polynomial, logarithmic or general source condition. 
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Lemma A. 3. Let w.M^ [IjOo) be an arbitrary weight function, and let 
Tff^ be the estimator defined in (2.7). Suppose there exists (3 > such that: 

(i) p := \\w ■ Tfx ■ d-^/eP/""^^)"^^^!! <oo is satisfied, then 
(A.15) E||u; • Tfx ■ l{\^e/w\'' < a}f < Cpia^ + m-''}p'; 

(ii) p := \\w ■ Tjx ■ \ log(|->^/e/wP)|^''^|| <oo is satisfied, then 
(A.16) E\\w ■ J' fx ■ H\ffe/w\^ < a}f < Cp\ \og{Cp{a + m~^])\"%''- 

(iii) p := \\w ■ J- fx ■ < oo, and assume that the index 
function k is concave, then 

(A.17) EWw • ^/x • Hn/wl'' < a}f < C« • K{C,{a + m-^}) • p^; 
where Cp, are positive constants depending only on fi and n, respectively. 

Proof. Denote ■= ^fx ■ l{\J^fJw\^ < a}. Then, using the inequal- 
ity (A. 11) together with a"^ > sup(gjj+ t'^ljt < a}, for ah 7 > 0, we have 

\\w ■ i^af < 22^{a^ • p' + \\w ■ Tfx ■ iJ'fe/wr" ■ \ffjw - J'fjwff}. 

Therefore, using (A. 8) in Lemma A. 2, we obtain the bound (A.15). 

The proof of (A.16) fohows along the same lines as the proof of (A. 2) in 
Lemma A.l. Consider the functions K/j, (pis and defined in the proof of 
(A. 2) in Lemma A.l, then in analogy to (A. 4), we bound 

(A.18) \\w ■ V'alP < ■ 'Ipa ■ 4>I3\\ ■ P, 

which implies 

(A.19) EU^f<{nw-i^a-M\^)'^^-P- 

Moreover, following the steps in (A. 5) together with (A.18), we have 

(A.20) 11.^, ■ipaf<a- \\w -iPa-Ml- P- 

Therefore, applying the triangular inequality together with (A.20), we obtain 
n^fe • ^af < 2E\\w • iJ'fJw - f],lw\ ■ + 2a(E||u; • ■ ^^pff'^p. 

By applying the Cauchy-Schwarz inequality and then (A. 8) in Lemma A. 2, 
we bound the first term by C(/?) • mT^ ■ /(El{|[^,](i)/u;(t)|2 < a})V2 . 
uP'it) • |[.F/x](t)pc?t, and using once again the Cauchy-Schwarz inequality, 

(A.21) n^f, . ^||2 < + al • (EIItz; • • <A^f )i/2 . p. 
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In analogy to (A. 6), by applying the convex function <I>^, we obtain 

(A.22) ^ nin^-^^-M?\ < mh-U\ 

Combining the three bounds (A. 19), (A. 21) and (A.22), as in (A. 7), implies 

,A.23) *,(2ti5^^i^^>:^!)<2|^ + „ 

We obtain the second bound (A. 16) by combining (A. 19) and (A.23). 

The proof of (A. 17) follows line by line the proof of (A. 16) using the 
functions k and $ rather than K,p and ^p. □ 

The next lemma generalizes Theorem 3.1 given in Neumann [31] by pro- 
viding a lower bound for the MISE under a general source condition, which 
requires for / G L2(R) and index function k the following definitions: 

:= {g ^V:\\Tg ■ M^fl'T'^'W <P}, P> 0, 

(A.24) 

AJ{t) := M\ [Tf]{t)\') ■ {m~'\[Tf]itr' A 1}}, t G M. 



Lemma A. 4. Suppose the samples from fy and fe are independent. Let 
f GV, and consider Vj defined in (4-2). Then, there exists C > 0, such that 

inf sup nTx-fxf>C ■ maxAf^(t). 

Proof. The proof is in analogy to the proof of Theorem 3.1 given in 
Neumann [31] and we omit the details. □ 

Proof of Theorem 4.2. The proof is based on the decomposition 
(4.1). Prom the bound given in (A. 14), the polynomial source conditions (3.2) 
and (A. 9) in Lemma A. 2 with w = is and 7 = /?, we obtain E||/xs — /xslls — 
vr-^a-i • E||/P - fyf + 2C{p) ■ ■ {a'^ • m'^-^ + (m • a)-i+'3^i • m"^^!}. 
While (A. 15) in Lemma A. 3 with w = £s and 7 = /? provides E||/^_, — /x||s < 
Cp ■ {a^ + m~^} ■ p^. Balancing these two terms then gives the result. □ 

Proof of Proposition 4.3. Let g'^ be defined by J^g^ := 4 • ^g, s e 
M. Now, by making use of the relation ||/^|| = the Hg-risk of an 

estimator fx of fx equals the MISE of f^ as estimator of f^. Moreover, fx 
belongs to 5^ if and only if f^ satisfies • (|.?^/~''|^)~^^^|| < P- Consider 
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the sets Vj and M^j defined in (4.2) and (A. 24) with k{t) = t^, respectively. 
Then, for any /o G ^/-si c > 0, Lemma A. 4 impUes 

inf sup E\\fx-fxfs>^S^ sup E\\f^-fxf 
fx fx^s^jJ.ev) f^^M^J^eV^l 

>Cmax{|[^/](.)P^{-^^Al}}^ 

where the lower bound is of order m~^^^^\ which proves the result. □ 

Proof of Corollary 4.4. The proof is similar to the proof of Corol- 
lary 3.3, but uses Theorem 4.2 rather than Theorem 3.2, and we omit the 
details. □ 

Proof of Corollary 4.5. Let / G Va, and consider the sets Vj and 

5^ defined in (4.2) and Proposition 4.3 with (3 = {p — s) /{a + s), respectively. 

If fe S Vj, then fx G is equivalent to fx G S'^'^^. Therefore, Proposition 
4.3 leads to the following lower bound: 

inf sup E\\fx - fxfs > 
/J fxeH^j.eVa 

The result now follows by combination of the last lower bound with the 
lower bound in the case of known G T>a (cf. Mair and Ruymgaart [26]), 
that is, inf^sup^^g^P,^^g^^E||/^ - fx\\l > Cn-2(P-^)/(2{P+-)+i) . □ 

Proof of Theorem 4.6. Considering the decomposition (4.1), we 
bound the first term as in (A. 12), and from (A. 16) in Lemma A. 3 with 
w = is and 7 = /?, the second term satisfies IE||/xs ~ fxWi < C/sl log(C^{a + 
m~^})\~^ . The conditions of the theorem provide then E||/xs — < 
C7.{E||/P-/yf Vm-i}V2 + C7.|log(C7-{E||/P-/yf Vm-i})|-^,for some 
constant C > depending only on p given in (3.2), (3 and c, which implies 
the result. □ 



Proof of Proposition 4.7. The proof follows along the same lines 
as the proof of Proposition 4.3. Here, using the logarithmic rather than the 
polynomial source condition. Lemma A. 4 implies 

inf sup E\\fx-fxfs 
fx /xG5;,/,e©J 

- ^TAjhmmtWwi^WfW ^ ^}}' 
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where the lower bound is of order (logm) ^, which gives the result. □ 

Proof of Corollary 4.8. The proof is similar to the proof of Corol- 
lary 3.5, but uses Theorem 4.6 rather than Theorem 3.4, and we omit the 
details. □ 

Proof of Corollary 4.9. The proof follows along the same hues as 
the proof of Corollary 4.5. Here, using Proposition 4.7 rather than Proposi- 
tion 4.3 leads to the lower bound C{logm)~^^~'^^^°' . The result follows then 
from the lower bound C (log the case of known (cf. Mair and 

Ruymgaart [26]). □ 

Proof of Theorem 4.10. The proof is based on the decomposition 
(4.1). From the bound given in (A. 14), the general source conditions (3.6) 
and (A.IO) in Lemma A.2 with w = 4, we obtain E\\fxs - /| Jls < TT'^a'^EWf^- 
/y IP + 2Cp'^{a~^m~^K{m~^) + K{m'^)}. While (A.17) in Lemma A.3 with 
w = £s provides ]E||/xs ~ fxWl < Cf^K^C^ia + m~^})p^ . The condition on a 
ensures then the balance of these two terms. The result follows by making 
use of the relation k(c • (5) < (c V 1) • n{6), which follows, for c < 1 and for 
c > 1, from the monotonicity and the concavity of k, respectively. □ 

Proof of Proposition 4.11. The proof follows along the same lines 
as the proof of Proposition 4.3. Here, using the general rather than the 
polynomial source condition. Lemma A. 4 implies 

inf sup E\\fx - fxWl > CmaxL(|[.F/](t)|2)|— -1— - All). 

Since k is increasing and K,{iP')/t^ is decreasing, it follows that the lower 
bound is of order n(l/m), which proves the result. □ 
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